28 research outputs found

    Reviewing Natural Language Processing Research

    Get PDF
    International audienceThis tutorial will cover the goals, processes, and evaluation of reviewing research in natural language processing. As has been pointed out for years by leading figures in our community (Web-ber, 2007), researchers in the ACL community face a heavy-and growing-reviewing burden. Initiatives to lower this burden have been discussed at the recent ACL general assembly in Florence (ACL 2019) 1. Simultaneously, notable "false negatives"-rejection by our conferences of work that was later shown to be tremendously important after acceptance by other conferences (Church, 2005)-has raised awareness of the fact that our reviewing practices leave something to be desired.. . and we do not often talk about "false positives" with respect to conference papers, but conversations in the hallways at *ACL meetings suggest that we have a publication bias towards papers that report high performance, with perhaps not much else of interest in them (Manning, 2015). It need not be this way. There is good reason to think that reviewing is a learnable (and teachable)

    War of Ontology Worlds: Mathematics, Computer Code, or Esperanto?

    Get PDF
    The use of structured knowledge representations—ontologies and terminologies—has become standard in biomedicine. Definitions of ontologies vary widely, as do the values and philosophies that underlie them. In seeking to make these views explicit, we conducted and summarized interviews with a dozen leading ontologists. Their views clustered into three broad perspectives that we summarize as mathematics, computer code, and Esperanto. Ontology as mathematics puts the ultimate premium on rigor and logic, symmetry and consistency of representation across scientific subfields, and the inclusion of only established, non-contradictory knowledge. Ontology as computer code focuses on utility and cultivates diversity, fitting ontologies to their purpose. Like computer languages C++, Prolog, and HTML, the code perspective holds that diverse applications warrant custom designed ontologies. Ontology as Esperanto focuses on facilitating cross-disciplinary communication, knowledge cross-referencing, and computation across datasets from diverse communities. We show how these views align with classical divides in science and suggest how a synthesis of their concerns could strengthen the next generation of biomedical ontologies

    Paradigms of evaluation in natural language processing: Field linguistics for glass box testing

    No full text
    Although software testing has been well-studied in computer science, it has received little attention in natural language processing. Nonetheless, a fully developed methodology for glass box evaluation and testing of language processing applications already exists in the field methods of descriptive linguistics. This work lays out a number of experiments that in the aggregate demonstrate the feasibility of software testing or glass box evaluation for natural language processing, and in the process validates the claim that the techniques of descriptive linguistics and field methods are a sound methodological approach to doing such testing. Various chapters consider the issue from the perspectives of the application of fieldwork techniques to software testing, applications of linguistics-informed software engineering to NLP, applications of the descriptive linguistics concept of complementary distribution to problems in NLP, and applications of descriptive linguistics concepts to the problem of quality assurance for semantic representations in proposition banks. In the experiment that most clearly shows the connection between linguistic fieldwork and software testing, a test suite that is constructed like a field linguist's elicitation schedule is used to find performance errors in five named entity recognition programs and to predict the performance of one program on several equivalence classes of named entities. In another experiment, from the software engineering perspective, a linguistically-informed fault model is used to isolate the source of a performance anomaly in a language processing application. In three subsequent experiments, a discovery procedure for minimal pairs and free variation is used to approach a problem in the normalization of named entities and a discovery procedure for complementary distribution is used to diagnose problematic semantic representations. The latter technique is applied to two corpora and two sets of predicate-argument structures; it is shown that the technique labels true positives with an accuracy of 69%

    Amazon Mechanical Turk: Gold Mine or Coal Mine?

    No full text
    Last Words editorial for Computational LinguisticsInternational audienceN/

    Annotateurs volontaires investis et éthique de l'annotation de lettres de suicidés

    No full text
    National audienceAccording to the World Health Organization, 800,000 people die of suicide every year. About 20% of them leave a written message. This paper discusses a corpus of such messages. The corpus was annotated with reference to the emotions expressed in the notes. The annotators were family or friends of someone who had died by suicide, or mental health professionals. We refer to these non-coercivally and altruistically motivated annotators as vested volunteers. A number of ethical issues are explored with this task and group of annotators, including the role of empathy, possible effects on the annotators, and the uses that might be made of the products of the annotation project. We conclude considering the projectfrom the point of view of the Ethics and Big Data Charter.Cet article prĂ©sente une perspective Ă©thique sur le projet dĂ©crit dans (Pestian et al., 2012b). La campagne d'annotation en question a visĂ© Ă  produire un corpus de lettres de suicidĂ©s annotĂ©es en Ă©motions. Les annotateurs Ă©taient soit des parents ou des amis de suicidĂ©s, soit des professionnels de la santĂ© mentale. Nous appelons ces annotateurs bĂ©nĂ©voles, volontaires pour faire avancer la recherche, des volontaires investis. Ce projet soulĂšve un certain nombre de questions Ă©thiques, notamment en ce qui concerne le rĂŽle de l'empathie des annotateurs, les effets possibles sur ceux-ci et les utilisations potentielles des rĂ©sultats obtenus. Nous concluons par une analyse du corpus du point de vue de la Charte Éthique et Big Data. Abstract. Annotating suicide notes : ethical issues at a glance. According to the World Health Organization, 800,000 people die of suicide every year. About 20% of them leave a written message. This paper discusses a corpus of such messages. The corpus was annotated with reference to the emotions expressed in the notes. The annotators were family or friends of someone who had died by suicide, or mental health professionals. We refer to these non-coercivally and altruistically motivated annotators as vested volunteers. A number of ethical issues are explored with this task and group of annotators, including the role of empathy, possible effects on the annotators, and the uses that might be made of the products of the annotation project. We conclude considering the project from the point of view of the Ethics and Big Data Charter
    corecore